15 research outputs found

    Recursive Training of 2D-3D Convolutional Networks for Neuronal Boundary Detection

    Full text link
    Efforts to automate the reconstruction of neural circuits from 3D electron microscopic (EM) brain images are critical for the field of connectomics. An important computation for reconstruction is the detection of neuronal boundaries. Images acquired by serial section EM, a leading 3D EM technique, are highly anisotropic, with inferior quality along the third dimension. For such images, the 2D max-pooling convolutional network has set the standard for performance at boundary detection. Here we achieve a substantial gain in accuracy through three innovations. Following the trend towards deeper networks for object recognition, we use a much deeper network than previously employed for boundary detection. Second, we incorporate 3D as well as 2D filters, to enable computations that use 3D context. Finally, we adopt a recursively trained architecture in which a first network generates a preliminary boundary map that is provided as input along with the original image to a second network that generates a final boundary map. Backpropagation training is accelerated by ZNN, a new implementation of 3D convolutional networks that uses multicore CPU parallelism for speed. Our hybrid 2D-3D architecture could be more generally applicable to other types of anisotropic 3D images, including video, and our recursive framework for any image labeling problem

    PZnet: Efficient 3D ConvNet Inference on Manycore CPUs

    Full text link
    Convolutional nets have been shown to achieve state-of-the-art accuracy in many biomedical image analysis tasks. Many tasks within biomedical analysis domain involve analyzing volumetric (3D) data acquired by CT, MRI and Microscopy acquisition methods. To deploy convolutional nets in practical working systems, it is important to solve the efficient inference problem. Namely, one should be able to apply an already-trained convolutional network to many large images using limited computational resources. In this paper we present PZnet, a CPU-only engine that can be used to perform inference for a variety of 3D convolutional net architectures. PZNet outperforms MKL-based CPU implementations of PyTorch and Tensorflow by more than 3.5x for the popular U-net architecture. Moreover, for 3D convolutions with low featuremap numbers, cloud CPU inference with PZnet outperfroms cloud GPU inference in terms of cost efficiency

    LoopTune: Optimizing Tensor Computations with Reinforcement Learning

    Full text link
    Advanced compiler technology is crucial for enabling machine learning applications to run on novel hardware, but traditional compilers fail to deliver performance, popular auto-tuners have long search times and expert-optimized libraries introduce unsustainable costs. To address this, we developed LoopTune, a deep reinforcement learning compiler that optimizes tensor computations in deep learning models for the CPU. LoopTune optimizes tensor traversal order while using the ultra-fast lightweight code generator LoopNest to perform hardware-specific optimizations. With a novel graph-based representation and action space, LoopTune speeds up LoopNest by 3.2x, generating an order of magnitude faster code than TVM, 2.8x faster than MetaSchedule, and 1.08x faster than AutoTVM, consistently performing at the level of the hand-tuned library Numpy. Moreover, LoopTune tunes code in order of seconds

    Efficient watershed algorithm implementation for large affinity graphs

    No full text
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 43).In this thesis, I designed and implemented an efficient, parallel, generalized watershed algorithm for hierarchical segmentation of affinity graphs. By introducing four variable parameters the algorithm enables us to use previous knowledge about the input graph in order to achieve better results. The algorithm is very suitable for hierarchical segmentintation of large scale 3D images of the brain tissue obtained by electron microscopy making it an essential tool for reconstructing the brain's neural-networks called connectomes. The algorithm was fully implemented in C++ and tested on a currently largest available affinity graph of size 90GB on which no existent watershed implementation could be applied.by Aleksandar Zlateski.M.Eng

    Scalable algorithms for semi-automatic segmentation of electron microscopy images of the brain tissue

    No full text
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 139-145).I present a set of fast and scalable algorithms for segmenting very large 3D images of brain tissue. Currently, light and electron microscopy can now produce terascale 3D images within hours. Extracting the information about the shapes and connectivity of the neurons require fast and accurate image segmentation algorithms. Due to the sheer size of the problem, traditional approaches might be computationally infeasible. I focus on an segmentation pipeline that breaks up the segmentation problem into multiple stages, each of which can be improved independently. In the first step of the pipeline, convolutional neural networks are used to predict segment boundaries. Watershed transform is then used to obtain an over-segmentation, which is then reduced using agglomerative clustering algorithms. Finally, manual or computer-assisted proof reading is done by experts. In this thesis, I revisit the traditional approaches for training and applying convolutional neural networks, and propose: - A fast and scalable 3D convolutional network training algorithm suited for multi-core and many-core shared memory machines. The two main quantities of the algorithm are: (1) minimizing the required computation by using FFT-based convolution with memoization, and (2) parallelization approach that can utilize large number of CPUs while minimizing any required synchronization. - A high throughput inference algorithm that can utilize all available computational resources, CPUs and GPUs. I introduce a set of highly parallel algorithms for different layer types and architectures, and show how to combine them to achieve very high throughput. Additionally, I study the theoretical properties of the watershed transform of edge- weighed graphs and propose a liner-time algorithm. I propose a set of modification to the standard algorithm and a quasi-linear agglomerative clustering algorithm that can greatly reduce the over-segmentation produced by the standard watershed algorithm.by Aleksandar Zlateski.Ph. D
    corecore